Feature subset selection for logistic regression via mixed integer optimization

نویسندگان

  • Toshiki Sato
  • Yuichi Takano
  • Ryuhei Miyashiro
  • Akiko Yoshise
چکیده

This paper concerns a method of selecting a subset of features for a logistic regression model. Information criteria, such as the Akaike information criterion and Bayesian information criterion, are employed as a goodness-offit measure. The feature subset selection problem is formulated as a mixed integer linear optimization problem, which can be solved with standard mathematical optimization software, by using a piecewise linear approximation. Computational experiments show that, in terms of solution quality, the proposed method has superiority over common stepwise methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Piecewise-Linear Approximation for Feature Subset Selection in a Sequential Logit Model

Abstract This paper concerns a method of selecting a subset of features for a sequential logit model. Tanaka and Nakagawa (2014) proposed a mixed integer quadratic optimization formulation for solving the problem based on a quadratic approximation of the logistic loss function. However, since there is a significant gap between the logistic loss function and its quadratic approximation, their fo...

متن کامل

Regression under a Modern Optimization Lens

In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving mixed integer optimization (MIO) problems ([16], [85], [104]). The common mindset of MIO as theoretically elegant but practically irrelevant is no longer justified. In this thesis, we propose a methodolo...

متن کامل

Network Intrusion Detection through Discriminative Feature Selection by Using Sparse Logistic Regression

Intrusion detection system (IDS) is a well-known and effective component of network security that provides transactions upon the network systems with security and safety. Most of earlier research has addressed difficulties such as overfitting, feature redundancy, high-dimensional features and a limited number of training samples but feature selection. We approach the problem of feature selectio...

متن کامل

The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization

We propose a new high-dimensional linear regression estimator: the Discrete Dantzig Selector, which minimizes the number of nonzero regression coefficients, subject to a budget on the maximal absolute correlation between the features and residuals. We show that the estimator can be expressed as a solution to a Mixed Integer Linear Optimization (MILO) problem, a computationally tractable framewo...

متن کامل

Towards Feature Selection in Networks

Traditional feature selection methods assume that the data are independent and identically distributed (i.i.d.). In real world, tremendous amounts of data are distributed in a network. Existing features selection methods are not suited for networked data because the i.i.d. assumption no longer holds. This motivates us to study feature selection in a network. In this paper, we present a supervis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Comp. Opt. and Appl.

دوره 64  شماره 

صفحات  -

تاریخ انتشار 2016